19 research outputs found

    Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J

    Full text link
    Well-designed and publicly available datasets of bugs are an invaluable asset to advance research fields such as fault localization and program repair as they allow directly and fairly comparison between competing techniques and also the replication of experiments. These datasets need to be deeply understood by researchers: the answer for questions like "which bugs can my technique handle?" and "for which bugs is my technique effective?" depends on the comprehension of properties related to bugs and their patches. However, such properties are usually not included in the datasets, and there is still no widely adopted methodology for characterizing bugs and patches. In this work, we deeply study 395 patches of the Defects4J dataset. Quantitative properties (patch size and spreading) were automatically extracted, whereas qualitative ones (repair actions and patterns) were manually extracted using a thematic analysis-based approach. We found that 1) the median size of Defects4J patches is four lines, and almost 30% of the patches contain only addition of lines; 2) 92% of the patches change only one file, and 38% has no spreading at all; 3) the top-3 most applied repair actions are addition of method calls, conditionals, and assignments, occurring in 77% of the patches; and 4) nine repair patterns were found for 95% of the patches, where the most prevalent, appearing in 43% of the patches, is on conditional blocks. These results are useful for researchers to perform advanced analysis on their techniques' results based on Defects4J. Moreover, our set of properties can be used to characterize and compare different bug datasets.Comment: Accepted for SANER'18 (25th edition of IEEE International Conference on Software Analysis, Evolution and Reengineering), Campobasso, Ital

    A Systematic Literature Review on the Impact of Formatting Elements on Code Legibility

    Full text link
    Context: Software programs can be written in different but functionally equivalent ways. Even though previous research has compared specific formatting elements to find out which alternatives affect code legibility, seeing the bigger picture of what makes code more or less legible is challenging. Goal: We aim to find which formatting elements have been investigated in empirical studies and which alternatives were found to be more legible for human subjects. Method: We conducted a systematic literature review and identified 15 papers containing human-centric studies that directly compared alternative formatting elements. We analyzed and organized these formatting elements using a card-sorting method. Results: We identified 13 formatting elements (e.g., indentation) and 33 levels of formatting elements (e.g., two-space indentation), which are about formatting styles, spacing, block delimiters, long or complex code lines, and word boundary styles. While some levels were found to be statistically better than other equivalent ones in terms of code legibility, e.g., appropriate use of indentation with blocks, others were not, e.g., formatting layout. For identifier style, we found divergent results, where one study found a significant difference in favor of camel case, while another study found a positive result in favor of snake case. Conclusion: The number of identified papers, some of which are outdated, and the many null and contradictory results emphasize the relative lack of work in this area and underline the importance of more research. There is much to be understood about how formatting elements influence code legibility before the creation of guidelines and automated aids to help developers make their code more legible.Comment: Journal of Systems and Softwar

    A systematic literature review on the impact of formatting elements on code legibility

    Get PDF
    Context: Software programs can be written in different but functionally equivalent ways. Even though previous research has compared specific formatting elements to find out which alternatives affect code legibility, seeing the bigger picture of what makes code more or less legible is challenging. Goal: We aim to find which formatting elements have been investigated in empirical studies and which alternatives were found to be more legible for human subjects. Method: We conducted a systematic literature review and identified 15 papers containing human-centric studies that directly compared alternative formatting elements. We analyzed and organized these formatting elements using a card-sorting method. Results: We identified 13 formatting elements (e.g., indentation) and 33 levels of formatting elements (e.g., two-space indentation), which are about formatting styles, spacing, block delimiters, long or complex code lines, and word boundary styles. While some levels were found to be statistically better than other equivalent ones in terms of code legibility, e.g., appropriate use of indentation with blocks, others were not, e.g., formatting layout. For identifier style, we found divergent results, where one study found a significant difference in favor of camel case, while another study found a positive result in favor of snake case. Conclusion: The number of identified papers, some of which are outdated, and the many null and contradictory results emphasize the relative lack of work in this area and underline the importance of more research. There is much to be understood about how formatting elements influence code legibility before the creation of guidelines and automated aids to help developers make their code more legible

    Uma abordagem usando visualização de software como apoio à refatoração para aspectos

    No full text
    The evolution of existing software systems to aspect-oriented technology has as first step the aspect mining, which aims to identify crosscutting concerns in source code to be encapsulated into aspects. Several techniques have been proposed for aspect mining, but still with shortcomings. One cause of these shortcomings pointed out in the literature is inadequate presentation of the results obtained by these techniques. Software Visualization can be used to analyze, interpret and combine results of aspect mining techniques, being the results presented with program characteristics. This work presents a visual approach of multiple coordinated views in order to provide an environment for the presentation of the results generated by aspect mining techniques, as well as to improve the understanding of the user to analyze them for future refactoring to aspects. The multiple coordinated views are used to allow the analysis: of associations based on method calls, at class and method levels, allowing visualization of the units call frequency based on fan-in metric; of the control and data dependencies between program instructions; of the program structure; of how instruction sets (slices) are composed in several classes; and of bytecode. The focus is to investigate whether visualization helps in program comprehension by the results generated using program slicing and fan-in analysis techniques, proposals for mining aspects in a complementary way. A software visualization tool, named SoftV is4CA (Software Visualization for Code Analysis), was developed to support the proposed visual approach. The preliminary study showed that the proposed coordination model supports the analysis by exploration of different levels of detailsA evolução de sistemas de software existentes para a tecnologia orientada a aspectos tem como primeiro passo a mineração de aspectos, que visa a identificar interesses transversais em código fonte, para serem encapsulados em aspectos. Diversas técnicas têm sido propostas para a mineração de aspectos, mas ainda com deficiências. Uma das causas dessas deficiências apontada na literatura é a apresentação inadequada dos resultados obtidos por tais técnicas. A Visualização de Software pode ser utilizada para analisar, interpretar e combinar resultados de técnicas de mineração de aspectos, sendo os resultados apresentados juntamente com características de programa. Neste trabalho é apresentada uma abordagem visual de múltiplas visões coordenadas com o propósito de prover um ambiente para a apresentação dos resultados gerados por técnicas de mineração de aspectos, para melhorar a compreensão do usuário ao analisá-los para futura refatoração para aspectos. As múltiplas visões coordenadas são utilizadas para permitir a análise: das associações baseadas em chamadas de métodos, em nível de classe e de método, permitindo a visualização da frequência de chamadas das unidades baseada na métrica fan-in; das dependências de controle e de dados entre instruções de programa; da estrutura de programa; de como conjuntos de instruções (fatias) são compostos em diversas classes; e do bytecode. O foco é investigar se a visualização contribui na compreensão de programas por meio dos resultados gerados usando as técnicas fatiamento de programa e análise de fan-in, propostas para minerar aspectos, de maneira complementar. Uma ferramenta de visualização de software, nomeada SoftV is4CA (Software Visualization for Code Analysis), foi desenvolvida para apoiar a abordagem visual proposta. O estudo preliminar mostrou que o modelo de coordenação proposto apoia a análise pela exploração de diferentes níveis de ...Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES
    corecore